28 research outputs found

    Extracting adverse drug reactions and their context using sequence labelling ensembles in TAC2017

    Full text link
    Adverse drug reactions (ADRs) are unwanted or harmful effects experienced after the administration of a certain drug or a combination of drugs, presenting a challenge for drug development and drug administration. In this paper, we present a set of taggers for extracting adverse drug reactions and related entities, including factors, severity, negations, drug class and animal. The systems used a mix of rule-based, machine learning (CRF) and deep learning (BLSTM with word2vec embeddings) methodologies in order to annotate the data. The systems were submitted to adverse drug reaction shared task, organised during Text Analytics Conference in 2017 by National Institute for Standards and Technology, archiving F1-scores of 76.00 and 75.61 respectively.Comment: Paper describing submission for TAC ADR shared tas

    GNTeam at 2018 n2c2:Feature-augmented BiLSTM-CRF for drug-related entity recognition in hospital discharge summaries

    Get PDF
    Monitoring the administration of drugs and adverse drug reactions are key parts of pharmacovigilance. In this paper, we explore the extraction of drug mentions and drug-related information (reason for taking a drug, route, frequency, dosage, strength, form, duration, and adverse events) from hospital discharge summaries through deep learning that relies on various representations for clinical named entity recognition. This work was officially part of the 2018 n2c2 shared task, and we use the data supplied as part of the task. We developed two deep learning architecture based on recurrent neural networks and pre-trained language models. We also explore the effect of augmenting word representations with semantic features for clinical named entity recognition. Our feature-augmented BiLSTM-CRF model performed with F1-score of 92.67% and ranked 4th for entity extraction sub-task among submitted systems to n2c2 challenge. The recurrent neural networks that use the pre-trained domain-specific word embeddings and a CRF layer for label optimization perform drug, adverse event and related entities extraction with micro-averaged F1-score of over 91%. The augmentation of word vectors with semantic features extracted using available clinical NLP toolkits can further improve the performance. Word embeddings that are pre-trained on a large unannotated corpus of relevant documents and further fine-tuned to the task perform rather well. However, the augmentation of word embeddings with semantic features can help improve the performance (primarily by boosting precision) of drug-related named entity recognition from electronic health records

    Point mutations affecting yeast prion propagation change the structure of its amyloid fibrils

    Get PDF
    We investigated the effect of the point substitutions in the N-terminal domain of the yeast prion protein Sup35 (Sup35NMp) on the structure of its amyloid fibrils. As the objects of the study, proteins with mutations that have different influence on the [PSI+] prion propagation, but do not prevent the aggregation of Sup35NMp in vitro were chosen. The use of the wide range of physico-chemical methods allowed us to show significant differences in the structure of these aggregates, their physical size, clumping tendency. Also we demonstrated that the fluorescent probe thioflavin T (ThT) can be successfully used for investigation of subtle changes in the structural organization of fibrils formed from various Sup35NMp. The obtained results and our theoretical predictions allowed us to conclude that some of selected amino acid substitutions delimit the region of the protein that forms the core of amyloid fibrils, and change the fibrils structure. The relationship of structural features of in vitro Sup35NMp amyloid aggregates with the stability of the [PSI+] prion in vivo allowed us to suggest that oligopeptide repeats (R) of the amyloidogenic N-terminal domain of Sup35NMp from R0 to R2 play a key role in protein aggregation. Their arrangement rather than just presence is critical for propagation of the strong [PSI+] prion variants. The results confirm the suitability of the proposed combination of theoretical and empirical approaches for identifying changes in the amyloid fibrils structure, which, in turn, can significantly affect both the functional stability of amyloid fibrils and their pathogenicity.Laboratorio de Investigación y Desarrollo de Bioactivo

    Data and systems for medication-related text classification and concept normalization from Twitter: insights from the Social Media Mining for Health (SMM4H)-2017 shared task

    Get PDF
    Objective: We executed the Social Media Mining for Health (SMM4H) 2017 shared tasks to enable the community-driven development and large-scale evaluation of automatic text processing methods for the classification and normalization of health-related text from social media. An additional objective was to publicly release manually annotated data.Materials and Methods: We organized 3 independent subtasks: automatic classification of self-reports of 1) adverse drug reactions (ADRs) and 2) medication consumption, from medication-mentioning tweets, and 3) normalization of ADR expressions. Training data consisted of 15 717 annotated tweets for (1), 10 260 for (2), and 6650 ADR phrases and identifiers for (3); and exhibited typical properties of social-media-based health-related texts. Systems were evaluated using 9961, 7513, and 2500 instances for the 3 subtasks, respectively. We evaluated performances of classes of methods and ensembles of system combinations following the shared tasks.Results: Among 55 system runs, the best system scores for the 3 subtasks were 0.435 (ADR class F1-score) for subtask-1, 0.693 (micro-averaged F1-score over two classes) for subtask-2, and 88.5% (accuracy) for subtask-3. Ensembles of system combinations obtained best scores of 0.476, 0.702, and 88.7%, outperforming individual systems.Discussion: Among individual systems, support vector machines and convolutional neural networks showed high performance. Performance gains achieved by ensembles of system combinations suggest that such strategies may be suitable for operational systems relying on difficult text classification tasks (eg, subtask-1).Conclusions: Data imbalance and lack of context remain challenges for natural language processing of social media text. Annotated data from the shared task have been made available as reference standards for future studies (http://dx.doi.org/10.17632/rxwfb3tysd.1).</div

    Using Twitter to mine sleep related information from people who declare a diagnosis of a psychotic disorder

    No full text
    ABSTRACT Objectives Our group has investigated the occurrence of psychotic(-like) experiences (PLEs) in Twitter posts, namely auditory hallucinations. Tweets classified as potentially related to auditory hallucinations were proportionately higher between 23:00 and 5:00 in comparison to tweets not classified. This may indicate a clinically significant relationship between sleep and PLEs in the general population, a notion supported by the literature. Based on our previous investigation, the current study aimed to explore whether this methodology could be amended to generate datasets regarding sleep experiences in people who self-report a diagnosis of a psychotic disorder. Approach The current investigation seeks to establish if it is feasible to generate anonymised datasets regarding sleep by extracting information from the timelines of people who self-report a psychotic diagnosis. A text mining method was implemented that utilised rule-based semantic filters that aimed to identify self-reported diagnoses. This focused on occurrences of personal and possessive pronouns to detect the subjectivity of tweets, as well as potential diagnostic verb indicators and any mentions of other related factors. For each diagnostic tweet, we collected information from user timelines. A sleep-related classifier was then implemented, which used lexical features (e.g. bag-of-words, part-of-speech tags) to predict whether a given tweet refers to sleep-related experience. Results After training the classifier on the bag-of-words model, the most informative words which contributed to the performance of the classifier were: ‘sleep’, ‘can’t awake’, ‘never’, ‘stress’. Part-of-speech tags (e.g. verbs, adverbs) were also important features. The classification accuracy of the ‘bag-of-words’ model was better than the ‘part-of-speech’ model. Through the method outlined herein, we were able to improve the quality of the generated datasets in comparison to the previous investigation. This methodology also facilitated the mining of individual Twitter users timelines who stated a personal diagnosis. To this end, an additional filter was implemented to identify tweets regarding sleep experience. The potential relationship between sentiment and temporality expressed in diagnosis and sleep experiences are also discussed. Conclusions The results from this study have implications for mental health research on Twitter. Specifically, the refinements in the methodology enabled retrieval of two high quality datasets regarding psychosis and sleep. Therefore it is feasible other psychosis-related phenomena (e.g. visual hallucinations, delusions, medication) could also be applied as separate filters to create one dataset of psychosis-related experiences within individuals diagnosed with psychosis
    corecore